Protein folding is the physical process by which a protein, after synthesis by a ribosome as a linear chain of Amino acid, changes from an unstable random coil into a more ordered three-dimensional structure. This structure permits the protein to become biologically functional or active.
The folding of many proteins begins even during the translation of the polypeptide chain. The amino acids interact with each other to produce a well-defined three-dimensional structure, known as the protein's native state. This structure is determined by the amino-acid sequence or primary structure.
The correct three-dimensional structure is essential to function, although some parts of functional proteins may remain unfolded,
Denaturation of proteins is a process of transition from a folded to an Random coil. It happens in cooking, , proteinopathy, and other contexts. Residual structure present, if any, in the supposedly unfolded state may form a folding initiation site and guide the subsequent folding reactions.
The duration of the folding process varies dramatically depending on the protein of interest. When studied in vitro, the slowest folding proteins require many minutes or hours to fold, primarily due to proline isomerization, and must pass through a number of intermediate states, like checkpoints, before the process is complete. On the other hand, very small single-protein domain proteins with lengths of up to a hundred amino acids typically fold in a single step. Time scales of milliseconds are the norm, and the fastest known protein folding reactions are complete within a few microseconds. The folding time scale of a protein depends on its size, contact order, and circuit topology.
Understanding and simulating the protein folding process has been an important challenge for computational biology since the late 1960s.
Proteins will have limitations on their folding abilities by the restricted bending angles or conformations that are possible. These allowable angles of protein folding are described with a two-dimensional plot known as the Ramachandran plot, depicted with psi and phi angles of allowable rotation.
Minimizing the number of hydrophobic side-chains exposed to water is an important driving force behind the folding process. The hydrophobic effect is the phenomenon in which the hydrophobic chains of a protein collapse into the core of the protein (away from the hydrophilic environment). In an aqueous environment, the water molecules tend to aggregate around the hydrophobic regions or side chains of the protein, creating water shells of ordered water molecules. An ordering of water molecules around a hydrophobic region increases order in a system and therefore contributes a negative change in entropy (less entropy in the system). The water molecules are fixed in these water cages which drives the hydrophobic collapse, or the inward folding of the hydrophobic groups. The hydrophobic collapse introduces entropy back to the system via the breaking of the water cages which frees the ordered water molecules. The multitude of hydrophobic groups interacting within the core of the globular folded protein contributes a significant amount to protein stability after folding, because of the vastly accumulated van der Waals forces (specifically London Dispersion forces). The hydrophobic effect exists as a driving force in thermodynamics only if there is the presence of an aqueous medium with an amphiphilic molecule containing a large hydrophobic region. The strength of hydrogen bonds depends on their environment; thus, H-bonds enveloped in a hydrophobic core contribute more than H-bonds exposed to the aqueous environment to the stability of the native state.
In proteins with globular folds, hydrophobic amino acids tend to be interspersed along the primary sequence, rather than randomly distributed or clustered together. However, proteins that have recently been born de novo, which tend to be intrinsically disordered, show the opposite pattern of hydrophobic amino acid clustering along the primary sequence.
A fully denatured protein lacks both tertiary and secondary structure, and exists as a so-called random coil. Under certain conditions some proteins can refold; however, in many cases, denaturation is irreversible. Cells sometimes protect their proteins against the denaturing influence of heat with known as heat shock proteins (a type of chaperone), which assist other proteins both in folding and in remaining folded. Heat shock proteins have been found in all species examined, from bacteria to humans, suggesting that they evolved very early and have an important function. Some proteins never fold in cells at all except with the assistance of chaperones which either isolate individual proteins so that their folding is not interrupted by interactions with other proteins or help to unfold misfolded proteins, allowing them to refold into the correct native structure. This function is crucial to prevent the risk of precipitation into insoluble amorphous aggregates. The external factors involved in protein denaturation or disruption of the native state include temperature, external fields (electric, magnetic), molecular crowding, and even the limitation of space (i.e. confinement), which can have a big influence on the folding of proteins. High concentrations of solutes, extremes of pH, mechanical forces, and the presence of chemical denaturants can contribute to protein denaturation, as well. These individual factors are categorized together as stresses. Chaperones are shown to exist in increasing concentrations during times of cellular stress and help the proper folding of emerging proteins as well as denatured or misfolded ones.
Under some conditions proteins will not fold into their biochemically functional forms. Temperatures above or below the range that cells tend to live in will cause Thermostability proteins to unfold or denature (this is why boiling makes an egg white turn opaque). Protein thermal stability is far from constant, however; for example, hyperthermophilic bacteria have been found that grow at temperatures as high as 122 °C, which of course requires that their full complement of vital proteins and protein assemblies be stable at that temperature or above.
The bacterium E. coli is the host for bacteriophage T4, and the phage encoded gp31 protein () appears to be structurally and functionally homologous to E. coli chaperone protein GroES and able to substitute for it in the assembly of bacteriophage T4 virus particles during infection. Like GroES, gp31 forms a stable complex with GroEL chaperonin that is absolutely necessary for the folding and assembly in vivo of the bacteriophage T4 major capsid protein gp23.
Aggregated proteins are associated with prion-related illnesses such as Creutzfeldt–Jakob disease, bovine spongiform encephalopathy (mad cow disease), amyloid-related illnesses such as Alzheimer's disease and familial amyloid cardiomyopathy or polyneuropathy, as well as intracellular aggregation diseases such as Huntington's and Parkinson's disease. These age onset degenerative diseases are associated with the aggregation of misfolded proteins into insoluble, extracellular aggregates and/or intracellular inclusions including cross-β amyloid . It is not completely clear whether the aggregates are the cause or merely a reflection of the loss of protein homeostasis, the balance between synthesis, folding, aggregation and protein turnover. Recently the European Medicines Agency approved the use of Tafamidis or Vyndaqel (a kinetic stabilizer of tetrameric transthyretin) for the treatment of transthyretin amyloid diseases. This suggests that the process of amyloid fibril formation (and not the fibrils themselves) causes the degeneration of post-mitotic tissue in human amyloid diseases. Misfolding and excessive degradation instead of folding and function leads to a number of proteopathy diseases such as antitrypsin-associated emphysema, cystic fibrosis and the lysosomal storage diseases, where loss of function is the origin of the disorder. While protein replacement therapy has historically been used to correct the latter disorders, an emerging approach is to use pharmaceutical chaperones to fold mutated proteins to render them functional.
Fluorescence spectroscopy can be used to characterize the equilibrium unfolding of proteins by measuring the variation in the intensity of fluorescence emission or in the wavelength of maximal emission as functions of a denaturant value. The denaturant can be a chemical molecule (urea, guanidinium hydrochloride), temperature, pH, pressure, etc. The equilibrium between the different but discrete protein states, i.e. native state, intermediate states, unfolded state, depends on the denaturant value; therefore, the global fluorescence signal of their equilibrium mixture also depends on this value. One thus obtains a profile relating the global protein signal to the denaturant value. The profile of equilibrium unfolding may enable one to detect and identify intermediates of unfolding. General equations have been developed by Hugues Bedouelle to obtain the thermodynamic parameters that characterize the unfolding equilibria for homomeric or heteromeric proteins, up to trimers and potentially tetramers, from such profiles. Fluorescence spectroscopy can be combined with fast-mixing devices such as stopped flow, to measure protein folding kinetics, generate a chevron plot and derive a Phi value analysis.
Because protein folding takes place in about 50 to 3000 s−1 CPMG Relaxation dispersion and chemical exchange saturation transfer have become some of the primary techniques for NMR analysis of folding. In addition, both techniques are used to uncover excited intermediate states in the protein folding landscape. To do this, CPMG Relaxation dispersion takes advantage of the spin echo phenomenon. This technique exposes the target nuclei to a 90 pulse followed by one or more 180 pulses. As the nuclei refocus, a broad distribution indicates the target nuclei is involved in an intermediate excited state. By looking at Relaxation dispersion plots the data collect information on the thermodynamics and kinetics between the excited and ground. Saturation Transfer measures changes in signal from the ground state as excited states become perturbed. It uses weak radio frequency irradiation to saturate the excited state of a particular nuclei which transfers its saturation to the ground state. This signal is amplified by decreasing the magnetization (and the signal) of the ground state.
The main limitations in NMR is that its resolution decreases with proteins that are larger than 25 kDa and is not as detailed as X-ray crystallography. Additionally, protein NMR analysis is quite difficult and can propose multiple solutions from the same NMR spectrum.
In a study focused on the folding of an amyotrophic lateral sclerosis involved protein SOD1, excited intermediates were studied with relaxation dispersion and Saturation transfer. SOD1 had been previously tied to many disease causing mutants which were assumed to be involved in protein aggregation, however the mechanism was still unknown. By using Relaxation Dispersion and Saturation Transfer experiments many excited intermediate states were uncovered misfolding in the SOD1 mutants.
A consequence of these evolutionarily selected sequences is that proteins are generally thought to have globally "funneled energy landscapes" (a term coined by José Onuchic) that are largely directed toward the native state. This "folding funnel" landscape allows the protein to fold to the native state through any of a large number of pathways and intermediates, rather than being restricted to a single mechanism. The theory is supported by both lattice protein and experimental studies, and it has been used to improve methods for protein structure prediction and protein design. The description of protein folding by the leveling free-energy landscape is also consistent with the 2nd law of thermodynamics. Physically, thinking of landscapes in terms of visualizable potential or total energy surfaces simply with maxima, saddle points, minima, and funnels, rather like geographic landscapes, is perhaps a little misleading. The relevant description is really a high-dimensional phase space in which manifolds might take a variety of more complicated topological forms.
The unfolded polypeptide chain begins at the top of the funnel where it may assume the largest number of unfolded variations and is in its highest energy state. Energy landscapes such as these indicate that there are a large number of initial possibilities, but only a single native state is possible; however, it does not reveal the numerous folding pathways that are possible. A different molecule of the same exact protein may be able to follow marginally different folding pathways, seeking different lower energy intermediates, as long as the same native structure is reached. Different pathways may have different frequencies of utilization depending on the thermodynamic favorability of each pathway. This means that if one pathway is found to be more thermodynamically favorable than another, it is likely to be used more frequently in the pursuit of the native structure. As the protein begins to fold and assume its various conformations, it always seeks a more thermodynamically favorable structure than before and thus continues through the energy funnel. Formation of secondary structures is a strong indication of increased stability within the protein, and only one combination of secondary structures assumed by the polypeptide backbone will have the lowest energy and therefore be present in the native state of the protein. Among the first structures to form once the polypeptide begins to fold are alpha helices and beta turns, where alpha helices can form in as little as 100 nanoseconds and beta turns in 1 microsecond.
There exists a saddle point in the energy funnel landscape where the transition state for a particular protein is found. The transition state in the energy funnel diagram is the conformation that must be assumed by every molecule of that protein if the protein wishes to finally assume the native structure. No protein may assume the native structure without first passing through the transition state. The transition state can be referred to as a variant or premature form of the native state rather than just another intermediary step. The folding of the transition state is shown to be rate-determining, and even though it exists in a higher energy state than the native fold, it greatly resembles the native structure. Within the transition state, there exists a nucleus around which the protein is able to fold, formed by a process referred to as "nucleation condensation" where the structure begins to collapse onto the nucleus.
or ab initio techniques for computational protein structure prediction can be used for simulating various aspects of protein folding. Molecular dynamics (MD) was used in simulations of protein folding and dynamics in silico. First equilibrium folding simulations were done using implicit solvent model and umbrella sampling. Because of computational cost, ab initio MD folding simulations with explicit water are limited to peptides and small proteins. MD simulations of larger proteins remain restricted to dynamics of the experimental structure or its high-temperature unfolding. Long-time folding processes (beyond about 1 millisecond), like folding of larger proteins (>150 residues) can be accessed using coarse-grained models.
Several large-scale computational projects, such as Rosetta@home, Folding@home and Foldit, target protein folding.
Long continuous-trajectory simulations have been performed on Anton, a massively parallel supercomputer designed and built around custom and interconnects by D. E. Shaw Research. The longest published result of a simulation performed using Anton as of 2011 was a 2.936 millisecond simulation of NTL9 at 355 K. Such simulations are currently able to unfold and refold small proteins (<150 amino acids residues) in equilibrium and predict how mutations affect folding kinetics and stability.
In 2020 a team of researchers that used AlphaFold, an artificial intelligence (AI) protein structure prediction program developed by DeepMind placed first in CASP, a long-standing structure prediction contest. The team achieved a level of accuracy much higher than any other group. It scored above 90% for around two-thirds of the proteins in CASP's global distance test (GDT), a test that measures the degree of similarity between the structure predicted by a computational program, and the empirical structure determined experimentally in a lab. A score of 100 is considered a complete match, within the distance cutoff used for calculating GDT.Robert F. Service, 'The game has changed.' AI triumphs at solving protein structures, Science, 30 November 2020
AlphaFold's protein structure prediction results at CASP were described as "transformational" and "astounding". Some researchers noted that the accuracy is not high enough for a third of its predictions, and that it does not reveal the physical mechanism of protein folding for the protein folding problem to be considered solved. Nevertheless, it is considered a significant achievement in computational biology and great progress towards a decades-old grand challenge of biology, predicting the structure of proteins.
Fold switching
Protein misfolding and neurodegenerative disease
Experimental techniques for studying protein folding
X-ray crystallography
Fluorescence spectroscopy
Circular dichroism
Vibrational circular dichroism of proteins
Protein nuclear magnetic resonance spectroscopy
Dual-polarization interferometry
Studies of folding with high time resolution
Proteolysis
Single-molecule force spectroscopy
Biotin painting
Computational studies of protein folding
Levinthal's paradox
Energy landscape of protein folding
Modeling of protein folding
[[File:ACBP MSM from Folding@home.tiff|right|thumb|350px|Folding@home uses [[Markov state model]]s, like the one diagrammed here, to model the possible shapes and folding pathways a protein can take as it condenses from its initial randomly coiled state (left) into its native 3D structure (right).]]
See also
External links
|
|